252
17
Genomics
nucleotides are fluorescently labelled dideoxynucleotides lacking the hydroxyl group
necessary for chain extension. Hybridization of the primer to the marker initiates
DNA polymerization templated by the unknown sequence. Whenever one of the
dideoxynucleotides is incorporated, extension of that chain is terminated. After the
system has been allowed to run for a time, such that all possible lengths may be
presumed to have been synthesized, the DNA is separated into single strands and
separated electrophoretically on a gel. The electrophoretogram (sometimes referred
to as an electropherogram) shows successive peaks differing in size by one nucleotide.
Since the dideoxynucleotides are labelled with a different fluorophore for each base,
the successive nucleotides in the unknown sequence can be read off by observing
the fluorescence of the consecutive peaks.
A useful approach for very long unknown sequences (such as whole genomes)
is to randomly fragment the entire genome (e.g., using ultrasound). The fragments,
each approximately two megabases long and sufficient to cover the genome fivefold
to tenfold, are cloned into a plasmid vector, 4 inserted into a bacterial genome and
multiplied. The extracted and purified DNA fragments are then sequenced as above.
The presence of overlaps allows the original sequence to be reconstructed. 5 This
method is usually called shotgun sequencing. 6 Of course, overlaps are not guar-
anteed, but gaps can be filled in principle by conventional sequencing. 7 The rival
method is called bacterial artificial chromosome (BAC) assembly, 8 in which large
fragments of DNA are cloned into a bacterial plasmid or other vector; the fragments
are then sequenced and combined into a single sequence. Being more precise and
producing a more contiguous sequence than the shotgun method, BAC assembly is
often used to assemble large genomes and can be used for the analysis of complex
genetic structures.
Every aspect of sequencing (reagents, procedures, separation methods, etc.) has,
of course, been subject to much development and improvement since its invention
(in Sanger’s original method, the dideoxynucleotides were radioactively labelled),
and there are now high-throughput automated methods in routine use.
Another popular technique is pyrosequencing, whereby one kind of nucleotide
only is added to the polymerizing complementary chain; if it is complementary
to the unknown sequence at the actual position, pyrophosphate is released upon
incorporation of the complementary nucleotide. Using some other reagents, this
is converted to ATP, which is then hydrolysed by the chemiluminescent enzyme
luciferin, yielding a brief pulse of detectable light. The technique is suitable for
automation. It is, however, practically limited to sequencing strands shorter than
about 150 base pairs.
New techniques are constantly being developed, with special interest being shown
in single-molecule sequencing, which would obviate the need for amplification of
4 In this context, “vector” is used in the sense of vehicle.
5 This is somewhat related to Kruskal’s multidimensional scaling (MD-SCAL or MDS) analysis.
6 Venter et al. (2001).
7 Unambiguously assembled nonoverlapping sequences are called “contigs”.
8 IHGSC (2001).